Linguistic aggregation methods in blog retrieval

نویسندگان

  • Mostafa Keikha
  • Fabio Crestani
چکیده

This paper addresses the blog distillation problem, that is, given a user query find the blogs that are most related to the query topic. We model each post as evidence of the relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging (OWA) operators to combine the evidence. We show that using only highly relevant evidence (posts) for each blog can result in an effective retrieval system. We also take into account the importance of the posts in a query-based cluster and investigate its effect in the aggregation results. We use prioritized OWA operators and show that considering the importance is effective when the number of aggregated posts from each blog is high. We carry out our experiments on three different data sets (TREC07, TREC08 and TREC09) and show statistically significant improvements over state of the art model called Voting Model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Blog Polarity Classification via Topic Analysis and Adaptive Methods

In this paper we examine different linguistic features for sentimental polarity classification, and perform a comparative study on this task between blog and review data. We found that results on blog are much worse than reviews and investigated two methods to improve the performance on blogs. First we explored information retrieval based topic analysis to extract relevant sentences to the give...

متن کامل

Effectiveness of Aggregation Methods in Blog Distillation

This paper addresses the blog distillation problem, that is, given a user query find the blogs that are most related to the query topic. We model each post as evidence of the relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging operators to combine the evidence. We show that using only highly relevant evidence (posts) for each blog can result in an effec...

متن کامل

Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm

Blogs are interactive and regularly updated websites which can be seen as diaries. These websites are composed by articles based on distinct topics. Thus, it is necessary to develop Information Retrieval approaches for this new web knowledge. The first important step of this process is the categorization of the articles. The paper above compares several methods using linguistic knowledge with k...

متن کامل

Arithmetic Aggregation Operators for Interval-valued Intuitionistic Linguistic Variables and Application to Multi-attribute Group Decision Making

The intuitionistic linguistic set (ILS) is an extension of linguisitc variable. To overcome the drawback of using single real number to represent membership degree and non-membership degree for ILS, the concept of interval-valued intuitionistic linguistic set (IVILS) is introduced through representing the membership degree and non-membership degree with intervals for ILS in this paper. The oper...

متن کامل

Analyzing the Sense Distribution of Concordances Obtained by Web as Corpus Approach

In corpus-based lexicography and natural language processing fields some authors have proposed using the Internet as a source of corpora for obtaining concordances of words. Most techniques implemented with this method are based on information retrieval-oriented web searchers. However, rankings of concordances obtained by these search engines are not built according to linguistic criteria but t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2012